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State of the Art and Applications 
Pierre Comon 

Lab. I3S, CNRS, BP121, F-06903 Sophia- Antipolis cedex, France 

Abstract 

In this paper, we present a partial survey of the tools borrowed from 
tensor algebra, which have been utilized recently in Statistics and Signal 
Processing. It is shown why the decompositions well known in linear al- 
gebra can hardly be extended to tensors. The concept of rank is itself 
difficult to define, and its calculation raises difficulties. Numerical algo- 
rithms have nevertheless been developed, and some are reported here, but 
their limitations are emphasized. These reports hopefully open research 
perspectives for enterprising readers. 



1 Introduction 

Applications. The decomposition of arrays of order higher than 2 has proven 
to be useful in a number of applications. The most striking case is perhaps Factor 
Analysis, where statisticians early identified difficult problems, tackling the lim- 
its of linear algebra. The difficulty lies in the fact that such arrays may have more 
factors than their dimensions. Next, data are often arranged in many- way ar- 
rays, and the reduction to 2-way arrays sometimes results in a loss of information. 
Lastly, the solution of some problems, including the so-called Blind Source Sepa- 
ration (BSS) generally requires the use of High-Order Statistics (HOS), which are 
intrinsically tensor objects [59] (McCullagh 1987). When second order statistics 
suffice to establish identifiability, the corresponding algorithms are quite sensi- 
tive to model uncertainties [57] (Liavas Regalia and Delmas 1999), so that the 
complementary use of HOS statistics is often still recommended. 

BSS finds applications in Sonar, Radar [T3J (Chaumettc Comon and Mullcr 
1993), Electrocardiography [32] (DeLathauwer DcMoor at alterae 2000), Speech 
EE] [52] [26] (Nguyen-Thi and Jutten 1996; Lee and Lewicki 1999; DeLathauwer 
1997), and Telecommunications [35] [3J] [ZD] d] W\ (Ferreol and Chevalier 
2000; Gassiat and Gamboa 1997; Van der Veen 1996; Castedo and Macchi 1997; 
Grellier and Comon 2000), among others. In particular, the surveillance of radio- 
communications in the civil context, or interception and classification in military 
applications, resort to BSS. Moreover, in Mobile Communications, the mitigation 
of interfering users and the compensation for channel fading effects are now 
devised with the help of BSS; this is closely related to the general problem of 
Blind Deconvolution. 

High-Order Factor Analysis is applied in many areas including Economy, 
Psychology [11] [10] (Carroll and Chang 1970; Carroll Pruzansky and Kruskal 



1980), Chemometrics [37] g] (Geladi et alterae 1989; Bro 1997), and Sensor 
Array Processing [18] [70] [65] (Comon 1989; Van der Veen and Paulraj 1996; 
Sidiropoulos Bro and Giannakis 2000; Comon 2000). Other fields where array 
decompositions can turn out to be useful include Exploratory Analysis [45| (Jones 
and Sibson 1987), Complexity Analysis [50] g3J (Kruskal 1977; Howell 1978), and 
Sparse Coding [H] (Hyvarinen Hoyer and Oja 1999). 

Bibliographical survey. Bergman [1] (1969) and Harshman [41] (1970) 
were the first to notice that the concept of rank was difficult to extend from 
matrices to higher order arrays. Carroll [TT] (1970) provided the first canonical 
decomposition algorithm of a three-way array, later referred to as Candecomp 
model. Several years later, Kruskal [50] (1977) conducted a detailed analysis 
of uniqueness, and related several definitions of rank. The algorithm Candelinc 
was devised by Carroll and others in the eighties [10] (Carroll et alterae 1980); 
it allowed to compute a canonical decomposition subject to a priori linear con- 
straints. 

Leurgans and others [54] (1993) derived sufficient idcntifiability conditions for 
the 3-way array decomposition; as opposed to Kruskal, his proof was constructive 
and yielded a numerical algorithm running several matrix SVD's. 

Instead of finding an exact decomposition of a d— way array, which requires 
more than d terms (as we shall subsequently see), Comon proposed [T5J (1991) 
to approximately decompose it into d terms. The problem was then reduced to 
finding an invertible linear transform (change of coordinates); see [TB] [0J (Comon 
1994; Cardoso 1993) and references therein. This decomposition is now referred 
to as " Independent Component Analysis" (ICA), whereas the exact Canonical 
Decomposition is sometimes referred to as underdetermined or over-complete ICA 
[52] PI (Lee et alterae 1999; Comon 1998). 

The terminology of ICA is meaningful in the context of Signal Processing 
and BSS [46] [16] [8] (Juttcn and Hcrault 1991; Comon 1994; Cardoso 1999). 
Constructive algorithms for ICA either proceed by sweeping the pairs of indices 
OH EES] M (Comon 1989; Comon 1991; Cardoso and Souloumiac 1993; Comon 
1994), or are of iterative nature, like power methods [55] [JO] (DcLathauwer 
Comon and others 1995; Kofidis and Regalia 2000), gradient descents [55J (Mac- 
chi and Moreau 1997), or Robbins-Monro algorithms [J0J [OJJ (Jutten and Hrault 
1991; Nguyen-Thi Jutten and others 1996). Some less efficient early methods 
were based on contracted versions of the array [6] (Cardoso 1989) or on noiseless 
observations [T4] (Comon 1989). 

A solid account on decompositions of 3-way arrays can also be found in 
DcLathauwer's PhD thesis [25] (1997); an interesting tool defined therein is the 
HOSVD 2>Q (DcLathauwer and others 1993), generalizing the concept of SVD to 
arrays of order 3, in a different manner compared to Carroll, but quite similar to 
the Tuckals decomposition [55] [55] [37J (Levin 1965; Tucker 1965; Geladi 1989). 
A good survey of rank issues can also be found in [I0J (Kofidis et alterae 2000). 
An account on idcntifiability issues can be found in [5J (Cao and Liu 1996). 
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2 Tensors 



2.1 Terminology 

The order of an array refers to the number of its ways; the entries of an array 
of order d are accessed via d indices, say ii-.id, with every index i a ranging from 
1 to n a . The integer n a is one of the d dimensions of the array. For instance, a 
matrix is a 2— way array (order 2), and thus has 2 dimensions. A vector is an 
array of order 1, and a scalar is of order 0. 

Throughout this paper, and unless otherwise specified, variables take their 
values in the real held, although all the statements hold true in the complex held 
with more complicated notations; boldface lowercase letters, like u, will denote 
single- way arrays, i.e. vectors, whereas boldface uppercase letters, like G, will 
denote arrays with more than one way, i.e. matrices or many- way arrays. The 
entries of arrays are scalar quantities and are denoted with plain letters, such as 
m or Gijitf. 

A tensor of order d is a d— way array that enjoys the multilinearity property 
after a change of coordinate system. For instance, consider a 3rd order tensor 
T with entries , and a change of coordinates defined by 3 square invertiblc 
matrices, A, B and C . Then, in the new coordinate system, the tensor T' can 
be written as a function of tensor T as: 

^Ijk = AiaBjbCkcTabc (1) 
a br- 
ill particular, moments and cumulants of random variables may be treated as ten- 
sors [59] (McCullagh 1987). This product is sometimes referred to as the Tucker 
product [49] (Kohdis et al. 2000) between matrices A, B, and C, weighted 
by T. Note that tensors enjoy property ([T]) even if the above matrices are not 
invertible; only linearity is required. 

Tensor algebra is a well identified framework; in particular, two kinds of 
indices are distinguished, covariant or contravariant, depending on the role they 
play in the application under consideration: an array can indeed be seen as an 
operator from one space to another. For the sake of simplicity, we shall not pay 
too much attention to this distinction, although it turns out to be important in 
contexts other than the present one. 

2.2 Notation 

Given two arrays of order m and n, one defines their outer product C = A o B 
as the array of order m + n: 

Cij..lab..d = Aij..£ B a b..d (2) 

For instance, the outer product of two vectors, u o v, is a matrix. 
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Given two arrays, A = {Aij.j} and B = {Bi'ji.j/} of orders (1a and ds 
respectively, having the same first dimension, one can define the mode— I con- 
traction product: 



For instance, the standard matrix- vector product is A u = A T • u. Similarly, 
one defines the mode—p inner product when arrays A and B have the same pth 
dimension, by summing over the pth index; the product is denoted as 



If unspecified, the contraction applies by default to the first index. Some authors 
denote this product as Ax p B, but we find it less readable. 

We define the Kronecker product u v between two vectors u and v as the 
vector containing all the possible cross-products [3j (Brewer 1978). If u and v 
arc of dimension J and K , then u v is of dimension JK. 

Lastly, we define the symmetric Kronecker product of a K— dimensional 
vector w by itself, denoted w w = w 02 , as the K((K + l)/2— dimensional 
vector containing all the distinct products, with an appropriate weighting of the 
cross terms so that \\w w\\ = \\w w\\. The product w 0d is defined in a 
similar manner for d > 2. For instance, w 03 is of size K{K + 1)(K + 2)/6. 

The vecs{-} operator puts a K x K symmetric matrix in the form of a 
K(K + l)/2— dimensional vector; conversely, Unvecs{-} puts it back in matrix 
form. For instance, Unvecs{/ 02 } = f f T . Refer to gl] [70] (Van der Veen 
1996; Grellier and Comon 2000) for a more detailed description. 

2.3 Homogeneous polynomials 

d— way arrays can be written in two different manners, as pointed out by several 
authors [SO] (McCullagh 1987), related to each other by a bijective mapping, /. 

Assume the notations d = Yik=i x k° an d \ j\ = f J2kJk- Then for homogeneous 
monomials of degree d, x J , we have |j| = d. 

To start with, take the example of (K, d) = (4,3): one can associate every 
entry T^-/. to a monomial T^-fc XiXjXk- For instance, Tm is associated with 
Tuixfxi, and thus to Tn 4 a^ 2 ' ' ' 1 !; this means that /([1,1,4]) = [2,0,0,1]. 

More generally, the d— dimensional vector index i = [i,j, k] can be associated 
with a A'— dimensional vector index f(i) containing the number of times each 
variable Xk appears in the associated monomial. Whereas the d entries of i take 
their values in {1, . . . , K}, the K entries of f(i) take their values in {1, . . . , d} 
with the constraint that fk{i) — d, Vi. 

As a consequence, the linear space of symmetric tensors can be bijectively 
associated with the linear space of homogeneous polynomials. To see this, it 




A»B 



p 
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suffices to associate every polynomial p(x) with the symmetric tensor G as: 

P (x)= G i xf(i) ( 3 ) 

\f{i)\=d 

where Gi are the entries of G. The dimension of these spaces is S = ( K+ f^ 1 ), 
and one can choose as a basis the set of monomials: B(K; d) = {x- \ j\ = d}. 

Example. Let p and q be two homogeneous polynomials in K vari- 
ables, associated with tensors P and Q, possibly of different orders. 
Then, polynomial pq is associated with P o Q: 

P (x)q(x) = J2^PiQjX f{i)+m =Y f l P °Qhi] xf([ij]) 

i 3 [ij] 

In practice, it is convenient to take into account the symmetry of the tensor, 
by defining c(j) as the number of times the entry T^-iy) appears in the array 

T- c(j) = where d = YlkJk- For binary quantics, K = 2 and 

c(j) = (/J = (/J; for instance, for d = 4, c([3, 1]) = 4 and c([2, 2]) = 6. 

Coefficients j(j,p) of a polynomial p in basis 23 are chosen so as satisfy the 
relation: 

P( x ) = l(3,P)c{j)x : > 

\j\=d 

Now, both spaces can be provided with a scalar product. For d— way arrays of 
dimension K, define the Froebenius scalar product: 

(G,H)=Y^Gi Hi 

i 

and the induced Euclidian norm. For homogeneous polynomials of degree d in 
K variables, define the scalar product as 

(p,q) = X! c U)i(j,p)iU,q) 

\j\=d 

In particular, monomials in B satisfy Ha^H = (J)\/d\ = l/c(j). In the case of 
binary quantics (K = 2), this was called the apolar scalar product |51j (Kung 
and Rota 1984). The latter definition has several advantages [23] (Comon and 
Mourrain 1996). For instance, if a(x) is a homogeneous linear form with coeffi- 
cient vector a, then the scalar product (a d , q) with any homogeneous polynomial 
q of degree d turns out to be the value of q at a: 

(a{x) d , q{x)) = c(i) 7(1, q) a 1 = q(a) 

i 

The interest in establishing a link between tensors and polynomials lies in the 
fact that polynomials have been studied rather deeply during the last century 
jSB] (Salmon 1885). Some of the results obtained will be useful in this chapter, 
and in particular the classification of cubics. 
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2.4 Genericity 



A property will be referred to as generic if it is true on a dense algebraic subset. 
The topology used is the standard one for homogeneous polynomials, namely 
that of Zariski [53] (Shafarevich 1977). Recall that in this topology, the closed 
subsets are defined by algebraic equations of the form p(x) = 0, where p is a 
polynomial. Its particularity is that two open non empty subsets always intersect; 
in other words, the topology is not separated. 

For instance, a symmetric matrix is generically of full rank. In fact, the set 
of singular matrices is defined by the polynomial relation det(A) = 0, which 
is associated with a closed subset, whose complementary is dense. We shall 
subsequently see that this does not hold true anymore for tensors of order higher 
than 2. For instance, the 2x2x2 tensor T such that Ti 2 2 = T212 = T221 = 1, 
and zero elsewhere (cf. figure[T|), is known to be of rank 3. However, the generic 
rank is 2 in that case, as will be discussed in section |4~21 



Figure 1: Non generic example of a binary cubic of maximal rank: position of 
non-zero entries in the 2x2x2 associated symmetric tensor. 



2.5 Array ranks 

Let T be a tensor, not necessarily symmetric, of dimensions m X . . . X rid- One 
defines the tensor rank wofTas the minimal number of rank one tensors whose 
linear combination yields T. The properties of tensor rank will be extensively 
discussed in section l4~2l For completeness, let us also mention the definition of 
mode—n ranks. 

The mode—n vectors of T are obtained by varying index i n and keeping the 
others fixed; there are thus as many mode—n vectors as possibilities of fixing 
indices k n. The mode—n rank, R n , is defined as the dimension of the 
linear space spanned by all mode—n vectors of T. 

Bounds. Howell (1978) [43] showed that the tensor rank can be bounded 
as u> < max^j {riiUj}. On the other hand, mode—n ranks and tensor rank are 
related by the inequality R n < w, Vfc. 

In the symmetric case, Rcznick showed that the tensor rank can be bounded 
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as a function of the dimension K and the order, d: 



( 



K + d - 2 
d-1 



) 



(4) 



but this bound is rather loose, except in some very particular cases, as will be 
commented in section f4. 21 

3 Cumulants 

3.1 Definitions 

Let z be a random variable of dimension K, with components z,;. Then one 
defines its moment and cumulant tensors of order d as: 



When the moment tensors of order less than or equal to d exist and arc finite, the 
cumulant tensor of order d exists and is finite. Whereas moments are the coeffi- 
cients of the expansion of the first characteristic function $ z (it) = F,{exp(ju T z)} 
about the origin, where the dotless j denotes y/— 1, cumulants are those of the 
second characteristic function, ^ z {u) = log(& z (u)); for complex random vari- 
ables, it suffices to consider the joint distribution of their real and imaginary 
parts. Moments and cumulants enjoy the multilincarity property |T]) and may 
be considered as tensors [SH] (McCullagh 1987). 

One important property of cumulant tensors is the following: if at least two 
variables, or groups of variables, among {z±, ..Zk} are statistically independent, 
then all cumulants involving these variables arc null. For instance, if all the zi 
are mutually independent, then Cf- t = 6(i,j, A [35] (Kendall and Stuart 

1977), where the Kronecker S is null unless all its arguments are equal. This 
property is not enjoyed by moments, hence the interest in cumulants. 

The reverse is not true. In fact, unless the random variable z is Gaussian, 
an infinite number of cumulants must vanish in order to ensure their strict sense 
independence. Therefore, when a cumulant tensor C z of order d is diagonal, we 
shall say that the random variables Zi are independent at order d. 

Gaussian variables play a particular role, since they are the only random 
variables that have a finite set of non-zero cumulants [17] [M] (Kagan et al. 
1973; Feller 1968). The cumulants of order 1 and 2 are better known under the 
names of statistical mean and covariance. To illustrate this property, I looked 
for a long time for a simple non Gaussian random variable having null cumulants 
of order 3 and 4. Take a random variable with values in the complex plane. If 
its distribution is with probability one half, and uniformly distributed on the 
unit circle with probability one half, then its mean E{z} is zero, its variance 
E{zz*} = 1, its second-order non-circular cumulant E{z 2 } = 0, its 2 marginal 




^{ z i 1 z i 2 ■ ■ ■ z id\ 

GvLm{z il , Zi 2 , . . . 
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cumulants of order 3, E{z 3 } and E{z 2 z*}, are zero, as well as its 3 marginal 
cumulants of order 4, Cum{z, z, z, z}, Cum{z, z, z, z*}, Cum{z, z, z*, z*}. Yet, 
this variable is obviously not Gaussian. This is a very striking example, that one 
encounters in secondary radar applications. 

3.2 Blind Source Separation 

Consider the linear statistical model 

y = Ax + v (5) 

where y is an observed random variable of dimension K , a; is a random vector of 
dimension P referred to as the source vector, A is a mixing matrix, and v stands 
for background noise, possible interferers, and measurement errors, independent 
of x. 

The Blind Source Separation (BSS) problem consists of estimating the mixing 
matrix, A, and possibly the corresponding estimates of x, solely from measure- 
ments of y. In the classical BSS framework, the components Xi of x are assumed 
to be statistically independent (generally not in the strict sense because a weaker 
independence is sufficient) [TBJ (Comon 1994). In some cases however, sources 
Xi may be correlated [70J [3S] [2JJ (Van dcr Veen 1996; Grellicr and Comon 
2000; Comon and Grcllier 1999), as we shall subsequently sec, and this is not 
necessarily an obstacle to their separation. 

To fix the ideas and simplify the notation, assume sources Xi are independent 
at order 4. Then, from the properties of cumulants we just described, we have: 

p 

Cfjke = AipA JP A kp Ai p Cp ppp (6) 
p=i 

up to an additive noise term, C v . From measurements of y, it is possible to 
estimate the cumulant tensor C y . Estimating A then amounts to finding the 
decomposition In practice, because of the noise v, this decomposition is not 
exact. In addition, since the only property utilized is the source independence 
at a given order, matrix A can only be identified up to a multiplicative factor 
JPA, where P is a permutation and A is diagonal invertible; see identifiability 
issues in [16] [5] (Cao and Liu 1996; Comon 1994). 

Blind Deconvolution is related to the above BSS modeling in two respects. 
First, a convolution with a finite impulse response can always be written as the 
product with a Toplitz matrix, which means that the modeling ((SJ still holds 
valid, provided matrix A is subject to the Toplitz structure [H] [3D] (Comon 
1994; Grigorascu and Regalia 1998). Second, if the source process is linear, then 
extracting the sources is equivalent to computing the linear prediction residue 
[T7] (Comon 1994). Then, the problem reduces to an unstructured static sepa- 
ration as in ([5]). 
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4 Array decompositions 



4.1 Diagonalization by change of coordinates 

Preprocessing for square mixtures. If the mixing matrix A is square and 
invcrtiblc, which means that the number of sources, P, is equal to the observa- 
tion dimension, K, then the BSS problem may be seen as a bijective congruent 
transformation (ICA). 

Denote R y the covariance matrix of the observation. The goal is to find an 
estimate z of x such that its components Zi are statistically independent. The 
first idea is thus to build a vector y = Ty that has a diagonal covariance, yielding 
decorrelated components. This can be easily done by searching for a (non unique) 
square root factor of R y ; it can be obtained by a Cholesky factorization or by 
an Eigen Value decomposition (EVD) of R y . We then define T as the inverse of 
this factor, so that TR y T T = I. 

With this preprocessing T, the obtained random variable y has a covariance 
equal to identity. We say that this variable is standardized. 

Now, it may be more appropriate, when the noise covariance, R v , (or con- 
versely the signal covariance R s = Ry — i? tI ) is known, to build T as the inverse 
of a square root of the signal covariance: TR S T T = I. In fact, this yields an 
unbiased solution in the presence of noise. Unfortunately, neither R v nor R s are 
known in general, hence the former procedure based on R y . 

Preprocessing for rectangular mixtures. In practice, one can always re- 
duce the problem to the latter when the number of sources, P, is smaller than the 
observation dimension, K, in the absence of noise, or when the noise covariance 
is known. This is now explained below. 

If noise is present, denote T v the inverse of a square root of R v , such that 
we have T V R V T V T = a v I; if noise is absent, set T v = I and a v = 0. Now 
consider the matrix T v R y T v = T V R S T V + al. Its EVD allows to detect the 
number of non-zero eigenvalues in R s [2] (Bienvenu and Kopp 1983), equal to P 
by definition, as well as to estimate the source space spanned by the associated 
eigenvectors: T v R y T v = UYXJ T + al. The matrix U is here of dimension 
K x P and of full rank. The preprocessing defined as T = U T T V eventually 
yields a P— dimensional standardized vector y = Ty whose noiseless part has a 
unit covariance, as in the previous paragraph. 

Lastly, if the mixture is rectangular, but with more sources than sensors, i.e., 
P > K, the mixture cannot be linearly inverted. Such mixtures are referred to as 
underdetermined or over-complete, as already pointed out in the bibliographical 
survey, and their identification will be addressed separately in section 14.21 In 
such a case, the preprocessing is unuseful, and not recommended. 

Orthogonal change of coordinates. In the preprocessing, we have done 
only part of the job. In fact, we have constructed a matrix T such that ideally 
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TAA T T T = I, but this only implies that TA = Q T , for some P x P orthogonal 
matrix Q. This Q factor still remains undetermined. It is thus necessary to 
resort statistics of order higher than 2, namely 3 or 4, unless other hypotheses 
can be assumed. The choice between these two possibilities depends on the 
conditioning of the problem, directly linked to the value of the diagonal tensor 
C x . At order 3, this tensor vanishes for all symmetrically distributed sources, 
which strongly limits its use. At order 4, this tensor is generally non zero, except 
in some exceptional pathological cases, as that mentioned in section f3. 11 

In order to find Q, one can attempt to diagonalize (approximately) the cumu- 
lant tensor of z = Qy, Cf jkl = J2 pqr s QtpQjqQkrQesC% qrs . The random variable 
z is eventually an estimate of the source vector x; in the absence of noise, we 
have z = PAx. Because Q is orthogonal, minimizing the non diagonal entries 
is equivalent to maximizing the diagonal ones |15j (Comon 1991), so that Q can 
be determined by 

Q = Arg Max T Q , 4 ; T Q , 4 = £ \C* ui \ a (7) 

i 

where a > 1. Several optimization criteria of this type, called contrasts, have 
been proposed [E] [60] [26] [8] [20] (Comon 1994; Moreau and Pesquet 1997; 
DcLathauwer 1997; Cardoso 1999; Comon 2001) and are justified by Information 
Theory arguments. Contrary to the matrix case [38] (Golub and Van Loan 1989), 
it is generally impossible to exactly null the non diagonal entries of a symmetric 
tensor of order higher than 2, by just rotating the coordinate axes. In other 
words, the class of decompositions presented in this section lead to rank— A' 
approximations of A'— dimensional symmetric tensors. More will be said in the 
next section. Numerical ICA algorithms are surveyed in section [5] 

4.2 Decomposition into a sum of rank— 1 arrays 

When the number of sources, P, is strictly larger than the observation dimen- 
sion A, the previous approach does not apply. In fact, the matrix A now has 
fewer rows than columns, and the noiseless relation y = Ax cannot be linearly 
inverted. In other words, A must be identified without attempting to extract 
the sources x p . A symmetric tensor of order d can be expressed via a Canonical 
Decomposition (CAND) of the form: 

CV = H 7(p) a (p) °a(p)o o(p) o a(p) (8) 

p=i 

The number of terms, to, reaches a minimum when it equals the tensor rank. This 
CAND decomposition allows the identification of matrix A if: (i) it is unique up 
to AP — indcterminations, and (ii) the tensor rank lu is larger than or equal to 
the number of sources, P. 
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Generic rank. We report in the tables below the generic value of the tensor 
rank as a function of the dimension K and the order d |24] (Comon and Mourrain 
1996). We also report the dimension D of the manifold of solutions; when it is 
zero, it means that there are a finite number of CAND (at most d s ), and there 
is a chance of identifying matrix A this way. 

Example. Fore matrices (d = 2), it is known that a quadratic form 
cannot be uniquely decomposed into a sum of squares. The manifold 
of solutions is of dimension D = K(K — l)/2). 
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Table 1: Generic rank lo of symmetric tensors as a function of the dimension K 
and the order d 
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Table 2: Generic dimension D of the manifold of solutions 

The first striking fact that appears in table [T] is that the rank can exceed 
the dimension, which is not true for matrices. For instance, it can be seen that 
P = 5 sources can be identified in dimension K — 4 with a 3rd order cumulant 
tensor, whereas this number increases to P = 10 with a 4th order tensor. 

One can also deduce from table [2] that 3rd order tensors have a finite number 
of CAND for even dimensions. For 4th order tensors, this is satisfied for dimen- 
sion 7, but not for lower ones. This is unfortunate, for 4th order cumulants are 
very often better conditioned than 3rd order ones. Furthermore, most of the 
proofs leading to these tables are not constructive. The only known constructive 
result is given by the Sylvester theorem f section [5.3p . 



QT— orbit 


uj(p) 


x 3 


1 


x 3 + y 3 


2 (generic) 


x 2 y 


3 



Table 3: Equivalence classes of binary cubics: orbits under the action of QI, the 
group of invertiblc 2— dimensional changes of coordinates. 
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Non generic rank. In addition, these results are only valid in generic cases. 
And it turns out that, contrary to matrices (i.e., 2nd order tensors), the generic 
rank is not always maximal. In other words, the rank can exceed its generic 
value. Unfortunately, the maximal achievable rank is not known for all pairs 
(P, K). But we can still illustrate this odd fact with particular values. 

Example. For instance, for K = 2 and d = 3, the maximal rank is 
3. The symmetric tensors having rank 3 are associated with poly- 
nomials in the orbit of x 2 y. The tensor associated with the latter 
homogeneous polynomial is represented in figure [I] where only 3 en- 
tries are equal to 1, the others being null. As reported in table [3] 
there is a single class associated with every value of the tensor rank. 

Now to make it more explicit, the polynomial x 2 y can be written as: 

6 x 2 y = (x + y) 3 + (-x + yf - 2y 3 
This relation can be rewritten in tensor form as: 

o3 , -, \ °3 / \ °3 

t= i : i + 1 ~ ] -2 



1 



which is an explicit irreducible CAND. This decomposition is de- 
picted in figure [21 Also note that in this case, the Reznick bound (j4]) 
is reached: u) = (|) = 3. 



> 


> — 









£3? 



+ 2 



Figure 2: Explicit decomposition of the non generic example of binary cubic of 
maximal rank. Black bullets represent +l's and white bullets — l's. 



Example. Now take K = 3 and d = 3. We are thus handling 3x3x3 
symmetric tensors, or equivalcntly, ternary cubics. The generic rank 
is 4, but the maximal rank is 5, according to table |H The class of 
maximal rank is unique, and a representative is depicted in figure 
[3l the 6 non-zero entries are all equal. Note that other non generic 
classes occur with also a rank of 4, as pointed out in tabled) 

Example. Finally, consider ternary quartics, i.e., (K,d) = (3,4). In 
this case, the number of free parameters in the tensor is S = (4) = 15. 
The number of free parameters in CAND exceeds 15 as soon asw > 5. 
So we could hope that we are lucky, because the number of free 
parameters is the same on both sides of CAND. Unfortunately, this 
is not the case, and Clebsh showed that the generic rank was 6 [33] 
(Ehrenborg and Rota 1993), as reported in table [U 
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QT— orbit 


w(p) 


x 3 




1 


x 3 + y 3 




2 


x 2 y 




3 


x 3 + 3y 2 z 




4 


x 3 + y 3 + 6 xyz 




4 


x 3 + 6 xyz 




4 


a (x 3 + y 3 + z 3 ) - 


- 6b xyz 


4 (generic) 


x 2 y + xz 2 




5 



Tabic 4: Equivalence classes for ternary cubics: orbits under the action of Q2, 
the group of invertible 3— dimensional changes of coordinates. 



k 

i 




Figure 3: Non generic example of ternary cubic, proved to be of maximal rank: 
position of non-zero entries 

4.3 Rank— 1 approximation 

Approximating a tensor by another of rank 1 has at least two applications in 
the present context. The first one is encountered when when P < K and when 
the source extraction is performed one source at a time in model (JSJ), contrary 
to section |4T] this is referred to as a deflation procedure. 

The maximization of the contrast {7} then reduces to that of a single output 
standardized cumulant (here the kurtosis), because a single unit-norm vector is 
sought, instead of a whole orthogonal matrix: 

w = Arg Max } WiWjW k wtCf- kP (9) 

||1B||=1 ^ 

ijkl 

Yet, it has been shown [28] [18) [49) (DeLathauwer Comon and others 1995; 
Comon 1998; Kofidis and regalia 2000) that this maximization problem is equiv- 
alent to minimizing \\C y — a w o w o w o w\\, which is simply finding the best 
rank— 1 approximate of tensor C v . 

The second application is found in analytical BSS when sources are discrete 
[3"9"] (Grellier and Comon 2000) or of constant modulus [7D] (Van der Veen 1996). 
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In this problem, we have to solve a system of N equations of the form (f T y n ) d = 
lj 1 < n < N . This is equivalent to solving a larger linear system Y f 0d = 1, 
under the constraint of Unvecs{/ 0d } being a rank— 1 tensor. Denote {u® d , 1 < 
q < P} a basis of Ker{y}. The solution to this system takes the form 

p 

f 0d_ f 0d , \ ,.0d 

J —J min < / j A P u p 

■p=l 

where /®*n ^ s ^ ne minimum norm solution. Unfolding these vectors in tensor 
form leads to the relation 

P 

F = F min + ^\ p U p (10) 
P =i 

This problem can be shown to be related to the rank— 1 combination problem 
that we describe below. 

4.4 Rank— 1 combination 

The rank— 1 combination problem consists of finding the numbers X p so that, 
given matrices U p , matrix X P U P has a rank of 1. Up to now, this problem 
has spawned solutions that are not entirely satisfactory. As a consequence, so 
are the solutions to (jTOJ) . 

Incidentally, we can restate the Joint Approximate Diagonalization (JAD) 
problem addressed in [9] (Cardoso and Souloumiac 1993) for the BSS into rank— 1 
combinations. 

The Joint Approximate Diagonalization of P matrices N p consists of 
finding a square matrix T such that N p m TA p T T , for all p, where A p are 
diagonal matrices. From a property recalled in section \2.2l this relation can be 

rewritten in vector form as vecs{iV p } ^ n p w J2i X P itf 2 1 ti denoting the ith 
column of T . If the matrix [X P i] is full rank and has more columns than rows, 
then there exists a matrix B such that t® 2 rss ^2 p B p jn p . Thus, given matrices 
N p , the problem is to find for every j, scalar coefficients (3 P such that ^2 p (5 p N p 
is a rank— 1 matrix, and hence the link with the rank— 1 combination problem. 

However, the two problems are not equivalent, for matrix B is not necessarily 
square. 

5 Numerical algorithms 

5.1 Contrast maximization 

The ICA diagonalization of section |4~T1 (as well as the JAD briefly mentioned in 
section FO)) can be solved entirely analytically in dimension K — 2, in a number 
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of instances. In order to exploit this property, Comon [13] [13] (1991) proposed 
a sweeping of the pairs of indices, in a similar manner as in the Jacobi diagonal- 
ization algorithm for Hcrmitian matrices [35J (Golub and Van Loan 1989). This 
idea has been later applied to JAD by Cardoso [S] (Cardoso and Souloumiac 
1993). To see this more in detail, consider the Givens rotation 

n _ ( cos0 smcj) exp(j£>) \ 

I — sin0 exp(— jO) cos(f> J 

where the angle <f> is imposed to lie in the interval (— it/2, tt/2], because of 
inherent AP— indeterminacies. Thus this matrix is entirely defined by the vector 
u = [cos 2(f), sm2cf> cos 8, sin 2(f) sin 6*]. Now, as in ([7|), define the contrast T ai d as 
the sum of the d— th order tensor diagonal entries raised to the power a. Then 
it can be shown that Ti,3 and Ti,4 are real quadratic forms in u, and can thus 
be easily maximized with respect to «, and hence to (9, (f>) (by convention, if 
a = 1, the absolute value is dropped in (JT))). On the other hand, this holds true 
for T 2j 3 but not for T 2i 4, which can be shown to be a quartic [T3] [10] (Comon 
1994; Comon 2001). Nevertheless, polynomials of degree 4 can still be rooted 
analytically. 

The procedure originally proposed by Comon (1989) consisting of sweeping 
all the pairs, like in some numerical algorithms dedicated to matrices, has never 
been proved to always lead to one of the AP— equivalent absolute maxima, even 
if this is always observed in practice. Counter-examples have never been found 
either. So we consider this convergence issue as an open problem, belonging to 
the general class of optimization problems over multiplicative groups. However, 
some elements of convergence are now reported below. 



Convergence. For compactness, denote G the cumulant tensor of the stan- 
dardized observation, y, which has been denoted up to now. Also denote 
Z = C z the cumulant tensor obtained after an orthogonal transformation Q. 
According to the multi-linearity property, we have that: 

Zpq..r — ^ ^ QpiQqj • • • Qr£ (H) 

ij-t 

Consider first the matrix case (order 2) in order to fix the ideas. The contrast 
([7]) can then be written as: 

T 2 , 2 = ^|Z pp | 2 (12) 

p 

Because Q is orthogonal, its differential can be written as 

dQ = dSQ (13) 

where matrix S is skew-symmetric. This yields the relation characterizing sta- 
tionary points, Z: \dT^,2 = 2J2 P t ZppS p tZ tp = 0. Yet, this is true for any 
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7 d 2 T 2 , 2 = AZ 2 qr - (Z qq - Z rr f (15) 



skew-symmetric matrix S, and hence for every skew-symmetric matrix having 
only two non zero entries (one +1 and one —1); based on this argument, one 
concludes that: 

(Z qq - Z rr )Z qr = 0, for q ^ r (14) 

Next, the local convexity can be examined with the help of the same tools, 
observing that: 

1 

T 

Thus, there are three kinds of stationary points: (i) those for which all diagonal 
entries are equal, which correspond to minima of T2,2, (h) those for which all non- 
diagonal entries arc null, which correspond to maxima, and (iii) saddle points, 
for which some diagonal entries are equal and some non-diagonal entries vanish. 
This result is well known, and proves that the only maxima are diagonal matrices, 
which can be deduced from each other by mere permutation within the diagonal. 

Now let us develop the same calculations for tensors of order 3 and 4. Sta- 
tionary values are given by the relations: 



r ^^2,3 — 3 Zp P p dSpt Z tpv — 0, 



— dT2A = 4 Zpppp dSpt Z tpp p = 
P,t 

or, on the basis of skew-symmetric matrices, for q ^ r: 

ZqqqZ qqr Z rrr Zq rr 0, (1^) 

ZqqqqZqqq r Z rTrr Zq TTr — 0, (1^) 

whereas local convexity conditions are governed by (Comon 1994) : 

— d 2 T3 = AZ qqr +4Z qrr — (Z qqq — Z qrr ) 2 — (Z rrr — Z qqr ) 2 (18) 

— d T4 = —Z qqrr J r 4:Z qqqr +4:Z qrrr — (Z qqqq — —Zqqrr)"— (Z rrrr — —Z qqrr ) (19) 



The comparison of these results with fTi]) and (|15p lead to two conclusions: 
(a) non-diagonal terms do not factorize anymore in (fl"6)) and (flT]) . so that sta- 
tionary values are more difficult to characterize, and (b) diagonal tensors are 
still local maxima, but there are a priori others. This is another problem, linked 
to optimization in groups, that this author considers as open. 



Sweeping strategies. We have presented several numerical algorithms aiming 
at separating P = 2 sources from K = 2 sensors in the presence of noise of 
unknown statistics. Inspired from the Jacobi cyclic-by rows sweeping strategy 
proposed for matrices, we can process all the K(K — l)/2 pairs one by one 
sequentially (Comon 1989; Comon 1994). However, as in the matrix case, the 



1G 



noise part (constituted by the actual background noise and all the other K — 2 
sources) changes at every step, so that a single sweeping is not sufficient. In 
practice, an order of \[K sweeps have been shown to be sufficient. 

Other strategies have been analyzed, and consist of processing first the pair of 
sensors that yields the maximal increase in the contrast criterion. This strategy 
has also been implemented successfully, but is not always numerically efficient. 

When processing one pair one can either recompute all the entries of 

the cumulant tensor that have been affected (i.e., those whose indices contain i 
or j), or compute the rotated data instead. The two possibilities do not have 
the same numerical complexity, and the best choice depends on the number of 
sensors, K, and on the number of samples, N. 

5.2 Parafac algorithm 

In [54] (Leurgans et altcrac 1993), SVD-based algorithms arc proposed to com- 
pute CAND of 3rd order tensors in larger dimensions. However, these algorithms, 
called Parafac, need the number of sources, P, to be smaller than or equal to 
| K — 1, in the symmetric case we are interested in. See also [50] [4] (Kruskal 
1977; Bro 1997) for more details. In view of table [T] reported above, this value 
of P is strictly smaller than the generic rank, co, except for (d, K) = (3, 2) or 
(d,K) = (3,4). As a consequence, Parafac algorithms can only approximate 
d— way arrays, in general. 

In the unsymmetric problem, the goal is to find three matrices, A, B, and 
G, such that Gijk = ^2 p A ip Bj p Ck p - One possible numerical algorithm is based 
on alternating least squares, as explained below for 3— way arrays [llj (Carroll 
and Chang 1970): 

• Start with (A(0), JB(0), (7(0)) 

• Define matrices G (1) , G (2) , G (3) : 

G ijk = G« = Gg } = G^; V = (jk), q = (ifc), r = (ij) 

• Estimate stage t + 1 from stage t by pseudo-inversion: 

— Update mode 1: 

A{t + l) = G {1] [B(ty C(t) T ]~ 

— Update mode 2: 

B(t + 1) = G (2) [A(t + 1) T c{ty\- 

— Update mode 3: 

C(t + 1) = G (3) [A(t + 1) T B(t + 1) T ]- 

wherc M~ denotes the Moore-Penrose pseudo inverse of M. See also [4] [26] 
[50] (Bro 1997; DeLathauwer 1997; Kruskal 1977) for more details on Parafac 
algorithms. 



17 



5.3 Sylvester theorem 

As already pointed out earlier, a rank-one tensor is associated with a linear form 
raised to the dth power. In terms of polynomials, the CAND decomposition can 
thus be rephrased: how can one decompose a quantic into a sum of dth powers of 
linear forms [23J (Comon and Mourrain 1996) ? This is this topic that addresses 
this theorem, restricted to the binary case however (i.e., two variables). 

Theorem 5.1 A binary quantic p[x,y) = ~^2^ =0 % c(i) x l y d ~ l can be written as 
a sum of alth powers of tu distinct linear forms: 

3=1 

if and only if (i) there exists a vector g of dimension ui + 1, with components gi, 
such that 

7o 7i ••• lw 

ld-u> ■ ■ ■ 7d-l Id 
and (ii) the polynomial q(x,y) = Y^e=o 9t x ^ U ul ~ f ' admits uj distinct roots. 

Sylvester's theorem not only proves the existence of the uj forms (second 
column in the tables), but also gives a means to compute them [18] [24] (Comon 
1998; Comon and Mourrain 1996). For odd values of d, we have thus a generic 
rank of uj = 2±1 5 whereas for even values of d, w = | + 1. So when d is odd, 
there is generically a unique vector g satisfying (|20[) . but there are two of them 
when d is even. This theorem shows that in column K = 2 of table [2j we have 
D = when d is odd, and D = 1 when d is even. 

In [27] (DeLathauwcr Comon and DeMoor 1999), several extensions to this 
theorem are proposed in the complex case. The basic idea remains the same, 
but the result becomes more complicated. 

The disappointing fact is that Sylvester's theorem cannot be extended to 
dimensions higher than 2. In fact, a key step in the proof [23] [H] (Comon and 
Mourrain 1996; Comon 1998) is that for any polynomial p of degree d, and any 
monomial m of degree d — to, there exists a polynomial q of degree u> such that 
qm is orthogonal to p. Equation (|20p expresses that orthogonality in terms of 
polynomial coefficients. It is clear that this holds true only when d > uj, which is 
unfortunately satisfied only in the binary case, according to tablc[T] Possibilities 
of extension to more than 2 variables is discussed in |24j (Comon and Mourrain 
1996). 

Simultaneous CAND. Let us go back to table [2] Among others, this tabic 
reports that there are infinitely many CAND for even orders, d. In order to 
fix this indeterminacy in the case (d, K, P) = (4, 2, 3) (the manifold of solutions 
is of dimension 1 in that situation), it is proposed in [18] (Comon 1998) to 
simultaneously diagonalize a second cumulant tensor of order 4. 



9 = 0. 



(20) 
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The help of virtual sources. In [TS] [H] (Comon 1998; Comon and Grellier 
1999) an algorithm dedicated to discrete sources is proposed, and performs both 
the identification of A and the extraction of sources Xj, in the case (d,K,P) = 
(2,2,3). 

In a few words, assume three sources x\ are mixed and received on two 
sensors, and assume these sources are all distributed in { — 1, +1} (they are called 
BPSK in digital communications). One can prove, if sources Xj are statistically 
independent, that the "virtual" source x\Xix-i is also BPSK-distributed, but 
obviously statistically dependent of the three former ones. However, one can 
still prove that all its fourth-order pairwise cross-cumulants vanish. Yet, only 
pairwise cumulants are utilized in the sweeping strategies maximizing contrasts 
such as T 2 ,4 in ([7]). As a consequence, viewed by the algorithm, sources are 
independent; one can thus build from y T — [2/1,2/2] virtual measurements yf, 
?/i 2/2, 2/12/2 j an d 2/2 j that can be modeled as linear mixtures of 4th order pairwise 
independent unknown sources. This allows the separation of the four sources 
(three actual and one virtual) from six sensors (two actual and four virtual). 

5.4 Rank-one approximation 

The rank— 1 approximation problem (section 14. 3|) has been partly solved by al- 
gorithms inspired from the matrix power method and devised for arrays of higher 
orders [55] [IS] (DeLathauwer Comon and others 1995; DeLathauwer 1997; 
Kofidis and Regalia 2000). 

Criteria. Given tensor C v , the goal is to find a vector w minimizing: 

Sl = \\C V — aw o w o w o w\\ (21) 

for some scalar number a. One can prove that minimizing (|2ip is equivalent to 
maximizing [T5] (Comon 1998): 

n d = \\C v »ww...»w\\ (22) 

or to minimizing: 

Q d _ 1 = \\C y »w...»w-Xw\\ (23) 
However, the other criteria Q r , < r < d — 1, are generally not equivalent. 

Stationary uplets (v, A) of il , ftd-i or fid are the same and satisfy: 

C V > 8I. . .«» = Xv 

d—1 times 

this suggests a Rayleigh-like iteration, tat we can call the Tensor Rayleigh sym- 
metric iteration: 

w <— C y • w • . . . • w 

d—1 times 

W <— tu/||i«|| 
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In [25J (Delathauwer Comon et al. 1995), it is suggested to run a non symmetric 
iteration, and to initialize the algorithm with the HOSVD. 

The rank— 1 combination problem (section [4. 4p has been solved in a sub- 
optimal way up to now in [70] [39] (Van der Veen and Paulraj 1996; Grcllier 
and Comon 2000) by solving a large unconstrained linear system, and trying to 
restore the structure afterwards. The optimal one-stage solving still remains to 
be devised. 

6 Concluding remarks 

In this chapter, we have partly surveyed the tools dedicated to tensor decomposi- 
tions, mainly through the problem of source separation. Thus, this presentation 
has been restrictive, but hopefully still informative. 

Many other source separation algorithms do not resort to tensor tools, and 
have not been reported here. It is worth noting that some of them do not need 
the sources to be statistically independent, so that the output cumulant tensor 
is not even aimed at being diagonal. Instead, other properties of the sources can 
be exploited, such as their discrete character, or their constant modulus [70] [66] 
[55] (Van der Veen and Paulraj 1996; Talwar Viberg and Paulraj 1996; Grellier 
and Comon 2000). When more sources than sensors are present, general results 
state that it is sometimes possible to identify the mixture, but source extraction 
requires more knowledge about the sources (e.g., their distribution). These issues 
have been tackled herein. Let us now turn to research perspectives. 

In the area of source separation, current hot research topics include (i) blind 
identification of under-determined mixtures, (ii) blind equalization of convolutivc 
mixtures, (iii) the theoretical proof of convergence of pair-sweeping algorithms, 
and, in the context of telecommunications, (iv) handling properly carrier residu- 
als when present in the measurements. In all cases, analytical block-algorithms 
are suitable when computer power is available and when the stationarity duration 
is short. 

As far as tensors are concerned, open research directions include: (i) the 
determination of the maximal achievable rank for arbitrary order and dimen- 
sions, (ii) the actual calculation of general Canonical Decompositions for K > 2, 
(iii) efficient numerical algorithms for computing an approximate of given rank. 
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